AITopics | historical event

Collaborating Authors

historical event

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5c5bc3553815adb4d1a8a5b8701e41a9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 08:37:07 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Workflow (0.46)
Research Report (0.46)

Industry:

Information Technology (0.68)
Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (0.94)
Information Technology > Communications > Networks (0.67)
(3 more...)

Add feedback

Do Large Language Models (LLMs) Understand Chronology?

Wongchamcharoen, Pattaraphon Kenny, Glasserman, Paul

arXiv.org Artificial IntelligenceNov-20-2025

Large language models (LLMs) are increasingly used in finance and economics, where prompt-based attempts against look-ahead bias implicitly assume that models understand chronology. We test this fundamental question with a series of chronological ordering tasks with increasing complexities over facts the model already knows from pre-training. Our tasks cover (1) chronological ordering, (2) conditional sorting (filter, then order), and (3) anachronism detection. We evaluate GPT-4.1, Claude-3.7 Sonnet, with and without Extended Thinking (ET), and GPT-5 across multiple reasoning-effort settings. Across models, Exact match rate drops sharply as sequences lengthen even while rank correlations stay high as LLMs largely preserve local order but struggle to maintain a single globally consistent timeline. In conditional sorting, most failures stem from the filtering step rather than the ordering step, but GPT-5 and Claude-3.7 Sonnet with Extended Thinking outshine normal models significantly. Lastly, anachronism detection is found to be the easiest task for the LLMs but performance still declines with increasingly overlapping timelines or entities. Overall, our main contribution is showing that allocating explicit reasoning budget helps with chronological ordering with GPT-5 at medium/high reasoning effort achieving flawless ordering at all lengths and perfect conditional sorting (both self-filtered and given-subset), whereas low/minimal effort degrades with longer lists, mirroring earlier models. Our findings delineate limits of current LLMs on chronological tasks, providing insights into task complexity, and demonstrate scenarios in which reasoning helps. These patterns are important for the real-time application of LLMs in finance. We release all code and evaluation templates to support full reproducibility.

large language model, machine learning, president, (21 more...)

arXiv.org Artificial Intelligence

2511.14214

Country:

Europe > United Kingdom (0.93)
North America > United States > California (0.46)

Genre: Research Report > New Finding (0.87)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory

Jahrmann, Margarete, Brandstetter, Thomas, Glasauer, Stefan

arXiv.org Artificial IntelligenceOct-14-2025

The paper presents the first results of an artistic research project investigating how Large Language Models (LLMs) curate and present collective memory. In a public installation exhibited during two months in Vienna in 2025, visitors could interact with five different LLMs (ChatGPT with GPT 4o and GPT 4o mini, Mistral Large, DeepSeek-Chat, and a locally run Llama 3.1 model), which were instructed to act as narrators, implementing a role-playing game revolving around the murder of Austrian philosopher Moritz Schlick in 1936. Results of the investigation include protocols of LLM-user interactions during the game and qualitative conversations after the play experience to get insight into the players' reactions to the game. In a quantitative analysis 115 introductory texts for role-playing generated by the LLMs were examined by different methods of natural language processing, including semantic similarity and sentiment analysis. While the qualitative player feedback allowed to distinguish three distinct types of users, the quantitative text analysis showed significant differences between how the different LLMs presented the historical content. Our study thus adds to ongoing efforts to analyse LLM performance, but also suggests a way of how these efforts can be disseminated in a playful way to a general audience.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.09874

Country: Europe > Austria > Vienna (0.36)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery

Neural Information Processing SystemsOct-8-2025, 18:28:41 GMT

Temporal graphs are widely used to model dynamic systems with time-varying interactions.

artificial intelligence, information management, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > China > Liaoning Province > Shenyang (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre:

Workflow (0.46)
Research Report (0.46)

Industry:

Information Technology (0.68)
Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

HistoryBankQA: Multilingual Temporal Question Answering on Historical Events

Mandal, Biswadip, Khandelwal, Anant, Gupta, Manish

arXiv.org Artificial IntelligenceSep-17-2025

Temporal reasoning about historical events is a critical skill for NLP tasks like event extraction, historical entity linking, temporal question answering, timeline summarization, temporal event clustering and temporal natural language inference. Yet efforts on benchmarking temporal reasoning capabilities of large language models (LLMs) are rather limited. Existing temporal reasoning datasets are limited in scale, lack multilingual coverage and focus more on contemporary events. To address these limitations, we present HistoryBank, a multilingual database of 10M+ historical events extracted from Wikipedia timeline pages and article infoboxes. Our database provides unprecedented coverage in both historical depth and linguistic breadth with 10 languages. Additionally, we construct a comprehensive question answering benchmark for temporal reasoning across all languages. This benchmark covers a diverse set of 6 temporal QA reasoning tasks, and we evaluate a suite of popular language models (LLaMA-3-8B, Mistral-7B, Gemma-2-9b, Qwen3-8B, GPT4o) to assess their performance on these tasks. As expected GPT4o performs best across all answer types and languages; Gemma-2 outperforms the other small language models. Our work aims to provide a comprehensive resource for advancing multilingual and temporally-aware natural language understanding of historical events. To facilitate further research, we will make our code and datasets publicly available upon acceptance of this paper.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.1272

Country:

South America (1.00)
North America > United States (1.00)
Asia > Middle East (1.00)
(2 more...)

Genre: Research Report (0.81)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Geopolitical biases in LLMs: what are the "good" and the "bad" countries according to contemporary language models

Salnikov, Mikhail, Korzh, Dmitrii, Lazichny, Ivan, Karimov, Elvir, Iudin, Artyom, Oseledets, Ivan, Rogov, Oleg Y., Loukachevitch, Natalia, Panchenko, Alexander, Tutubalina, Elena

arXiv.org Artificial IntelligenceJun-23-2025

This paper evaluates geopolitical biases in LLMs with respect to various countries though an analysis of their interpretation of historical events with conflicting national perspectives (USA, UK, USSR, and China). We introduce a novel dataset with neutral event descriptions and contrasting viewpoints from different countries. Our findings show significant geopolitical biases, with models favoring specific national narratives. Additionally, simple debiasing prompts had a limited effect in reducing these biases. Experiments with manipulated participant labels reveal models' sensitivity to attribution, sometimes amplifying biases or recognizing inconsistencies, especially with swapped labels. This work highlights national narrative biases in LLMs, challenges the effectiveness of simple debiasing methods, and offers a framework and dataset for future geopolitical bias research.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.06751

Country:

North America (1.00)
Europe (1.00)
Asia > Russia (1.00)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > Europe Government > Russia Government (0.46)
Government > Regional Government > Asia Government > Russia Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

MADial-Bench: Towards Real-world Evaluation of Memory-Augmented Dialogue Generation

He, Junqing, Zhu, Liang, Wang, Rui, Wang, Xi, Haffari, Reza, Zhang, Jiaxing

arXiv.org Artificial IntelligenceOct-23-2024

Long-term memory is important for chatbots and dialogue systems (DS) to create consistent and human-like conversations, evidenced by numerous developed memory-augmented DS (MADS). To evaluate the effectiveness of such MADS, existing commonly used evaluation metrics, like retrieval accuracy and perplexity (PPL), mainly focus on query-oriented factualness and language quality assessment. However, these metrics often lack practical value. Moreover, the evaluation dimensions are insufficient for human-like assessment in DS. Regarding memory-recalling paradigms, current evaluation schemes only consider passive memory retrieval while ignoring diverse memory recall with rich triggering factors, e.g., emotions and surroundings, which can be essential in emotional support scenarios. To bridge the gap, we construct a novel Memory-Augmented Dialogue Benchmark (MADail-Bench) covering various memory-recalling paradigms based on cognitive science and psychology theories. The benchmark assesses two tasks separately: memory retrieval and memory recognition with the incorporation of both passive and proactive memory recall data. We introduce new scoring criteria to the evaluation, including memory injection, emotion support (ES) proficiency, and intimacy, to comprehensively assess generated responses. Results from cutting-edge embedding models and large language models on this benchmark indicate the potential for further advancement. Extensive testing further reveals correlations between memory injection, ES proficiency, and intimacy.

dialogue, emotion, evaluation, (15 more...)

arXiv.org Artificial Intelligence

2409.1524

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Past Meets Present: Creating Historical Analogy with Large Language Models

Li, Nianqi, Yuan, Siyu, Chen, Jiangjie, Liang, Jiaqing, Wei, Feng, Liang, Zujie, Yang, Deqing, Xiao, Yanghua

arXiv.org Artificial IntelligenceSep-23-2024

Historical analogies, which compare known past events with contemporary but unfamiliar events, are important abilities that help people make decisions and understand the world. However, research in applied history suggests that people have difficulty finding appropriate analogies. And previous studies in the AI community have also overlooked historical analogies. To fill this gap, in this paper, we focus on the historical analogy acquisition task, which aims to acquire analogous historical events for a given event. We explore retrieval and generation methods for acquiring historical analogies based on different large language models (LLMs). Furthermore, we propose a self-reflection method to mitigate hallucinations and stereotypes when LLMs generate historical analogies. Through human evaluations and our specially designed automatic multi-dimensional assessment, we find that LLMs generally have a good potential for historical analogies. And the performance of the models can be further improved by using our self-reflection method.

analogy, historical analogy, historical event, (13 more...)

arXiv.org Artificial Intelligence

2409.1482

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Iraq (0.04)
Asia > China > Hong Kong (0.04)
(15 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

Chang, He, Ye, Chenchen, Tao, Zhulin, Wu, Jie, Yang, Zhengmao, Ma, Yunshan, Huang, Xianglin, Chua, Tat-Seng

arXiv.org Artificial IntelligenceJul-16-2024

Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluation of LLM-based methods for temporal event forecasting. Due to the lack of a high-quality dataset that involves both graph and textual data, we first construct a benchmark dataset, named MidEast-TE-mini. Based on this dataset, we design a series of baseline methods, characterized by various input formats and retrieval augmented generation(RAG) modules. From extensive experiments, we find that directly integrating raw texts into the input of LLMs does not enhance zero-shot extrapolation performance. In contrast, incorporating raw texts in specific complex events and fine-tuning LLMs significantly improves performance. Moreover, enhanced with retrieval modules, LLM can effectively capture temporal relational patterns hidden in historical events. Meanwhile, issues such as popularity bias and the long-tail problem still persist in LLMs, particularly in the RAG-based method. These findings not only deepen our understanding of LLM-based event forecasting methods but also highlight several promising research directions.We consider that this comprehensive evaluation, along with the identified research opportunities, will significantly contribute to future research on temporal event forecasting through LLMs.

atomic event, event forecasting, forecasting, (15 more...)

arXiv.org Artificial Intelligence

2407.11638

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > District of Columbia > Washington (0.05)
Asia > Middle East > Israel (0.05)
(8 more...)

Genre: Research Report (1.00)

Industry: Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention

Wan, Yixin, Wu, Di, Wang, Haoran, Chang, Kai-Wei

arXiv.org Artificial IntelligenceJun-29-2024

Prompt-based "diversity interventions" are commonly adopted to improve the diversity of Text-to-Image (T2I) models depicting individuals with various racial or gender traits. However, will this strategy result in nonfactual demographic distribution, especially when generating real historical figures? In this work, we propose DemOgraphic FActualIty Representation (DoFaiR), a benchmark to systematically quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. DoFaiR consists of 756 meticulously fact-checked test instances to reveal the factuality tax of various diversity prompts through an automated evidence-supported evaluation pipeline. Experiments on DoFaiR unveil that diversity-oriented instructions increase the number of different gender and racial groups in DALLE-3's generations at the cost of historically inaccurate demographic distributions. To resolve this issue, we propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history, and incorporate it into the generation context of T2I models. By orienting model generations using the reflected historical truths, FAI significantly improves the demographic factuality under diversity interventions while preserving diversity.

diversity intervention, factuality, intervention, (14 more...)

arXiv.org Artificial Intelligence

2407.00377

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom (0.14)
Asia > China (0.05)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback